33 research outputs found

    Estimating Diffusion Network Structures: Recovery Conditions, Sample Complexity & Soft-thresholding Algorithm

    Full text link
    Information spreads across social and technological networks, but often the network structures are hidden from us and we only observe the traces left by the diffusion processes, called cascades. Can we recover the hidden network structures from these observed cascades? What kind of cascades and how many cascades do we need? Are there some network structures which are more difficult than others to recover? Can we design efficient inference algorithms with provable guarantees? Despite the increasing availability of cascade data and methods for inferring networks from these data, a thorough theoretical understanding of the above questions remains largely unexplored in the literature. In this paper, we investigate the network structure inference problem for a general family of continuous-time diffusion models using an l1l_1-regularized likelihood maximization framework. We show that, as long as the cascade sampling process satisfies a natural incoherence condition, our framework can recover the correct network structure with high probability if we observe O(d3log⁥N)O(d^3 \log N) cascades, where dd is the maximum number of parents of a node and NN is the total number of nodes. Moreover, we develop a simple and efficient soft-thresholding inference algorithm, which we use to illustrate the consequences of our theoretical results, and show that our framework outperforms other alternatives in practice.Comment: To appear in the 31st International Conference on Machine Learning (ICML), 201

    A glance into the Pathology of Covid-19, Its Current and Possible Treatments; Interleukin Antagonists as an Effective Option; A review.

    Get PDF
    The outbreak of the novel SARS-COV-2 and its following complications has caused an almost unprecedented chaos throughout the world in recent years. Although a series of vaccines have been proposed recently in order to reduce the risk of mortality and morbidity of this disease, an ultimate and reliable cure has yet to be discovered. One of the major complications of Covid-19 is the outburst of a series of inflammatory responses in the respiratory system of the patients, which eventually causes a hypoxemic pneumonitis and accounts for most of the Covid-19 patients’ mortality. It is suggested that a group of inflammatory cytokines such as different interleukins are responsible for this complication, therefore drugs which can influence this system may be useful in reducing this exaggerated inflammatory response which is dubbed the ‘cytokine storm’. In this article we review potential treatment options for reducing the inflammatory response and discuss some clinical trials and case reports related to the drugs interfering with responsible interleukins in order to quench the cytokine storm. &nbsp

    On the impact of activation and normalization in obtaining isometric embeddings at initialization

    Full text link
    In this paper, we explore the structure of the penultimate Gram matrix in deep neural networks, which contains the pairwise inner products of outputs corresponding to a batch of inputs. In several architectures it has been observed that this Gram matrix becomes degenerate with depth at initialization, which dramatically slows training. Normalization layers, such as batch or layer normalization, play a pivotal role in preventing the rank collapse issue. Despite promising advances, the existing theoretical results (i) do not extend to layer normalization, which is widely used in transformers, (ii) can not characterize the bias of normalization quantitatively at finite depth. To bridge this gap, we provide a proof that layer normalization, in conjunction with activation layers, biases the Gram matrix of a multilayer perceptron towards isometry at an exponential rate with depth at initialization. We quantify this rate using the Hermite expansion of the activation function, highlighting the importance of higher order (≄2\ge 2) Hermite coefficients in the bias towards isometry

    Batch Normalization Orthogonalizes Representations in Deep Random Networks

    Full text link
    This paper underlines a subtle property of batch-normalization (BN): Successive batch normalizations with random linear transformations make hidden representations increasingly orthogonal across layers of a deep neural network. We establish a non-asymptotic characterization of the interplay between depth, width, and the orthogonality of deep representations. More precisely, under a mild assumption, we prove that the deviation of the representations from orthogonality rapidly decays with depth up to a term inversely proportional to the network width. This result has two main implications: 1) Theoretically, as the depth grows, the distribution of the representation -- after the linear layers -- contracts to a Wasserstein-2 ball around an isotropic Gaussian distribution. Furthermore, the radius of this Wasserstein ball shrinks with the width of the network. 2) In practice, the orthogonality of the representations directly influences the performance of stochastic gradient descent (SGD). When representations are initially aligned, we observe SGD wastes many iterations to orthogonalize representations before the classification. Nevertheless, we experimentally show that starting optimization from orthogonal representations is sufficient to accelerate SGD, with no need for BN

    Transformers learn to implement preconditioned gradient descent for in-context learning

    Full text link
    Motivated by the striking ability of transformers for in-context learning, several works demonstrate that transformers can implement algorithms like gradient descent. By a careful construction of weights, these works show that multiple layers of transformers are expressive enough to simulate gradient descent iterations. Going beyond the question of expressivity, we ask: Can transformers learn to implement such algorithms by training over random problem instances? To our knowledge, we make the first theoretical progress toward this question via analysis of the loss landscape for linear transformers trained over random instances of linear regression. For a single attention layer, we prove the global minimum of the training objective implements a single iteration of preconditioned gradient descent. Notably, the preconditioning matrix not only adapts to the input distribution but also to the variance induced by data inadequacy. For a transformer with kk attention layers, we prove certain critical points of the training objective implement kk iterations of preconditioned gradient descent. Our results call for future theoretical studies on learning algorithms by training transformers

    Batch Normalization Provably Avoids Rank Collapse for Randomly Initialised Deep Networks

    Full text link
    Randomly initialized neural networks are known to become harder to train with increasing depth, unless architectural enhancements like residual connections and batch normalization are used. We here investigate this phenomenon by revisiting the connection between random initialization in deep networks and spectral instabilities in products of random matrices. Given the rich literature on random matrices, it is not surprising to find that the rank of the intermediate representations in unnormalized networks collapses quickly with depth. In this work we highlight the fact that batch normalization is an effective strategy to avoid rank collapse for both linear and ReLU networks. Leveraging tools from Markov chain theory, we derive a meaningful lower rank bound in deep linear networks. Empirically, we also demonstrate that this rank robustness generalizes to ReLU nets. Finally, we conduct an extensive set of experiments on real-world data sets, which confirm that rank stability is indeed a crucial condition for training modern-day deep neural architectures
    corecore